BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters
نویسندگان
چکیده
Large datasets can be screened for sequences from a specific organism, quickly and with low memory requirements, by a data structure that supports time- and memory-efficient set membership queries. Bloom filters offer such queries but require that false positives be controlled. We present BioBloom Tools, a Bloom filter-based sequence-screening tool that is faster than BWA, Bowtie 2 (popular alignment algorithms) and FACS (a membership query algorithm). It delivers accuracies comparable with these tools, controls false positives and has low memory requirements. Availability and implementaion: www.bcgsc.ca/platform/bioinfo/software/biobloomtools.
منابع مشابه
Bloom Filters in Probabilistic Verification
Probabilistic techniques for verification of finite-state transition systems offer huge memory savings over deterministic techniques. The two leading probabilistic schemes are hash compaction and the bitstate method, which stores states in a Bloom filter. Bloom filters have been criticized for being slow, inaccurate, and memory-inefficient, but in this paper, we show how to obtain Bloom filters...
متن کاملData Caching in Ad Hoc Networks using Bloom Filters
Data caching provides efficient data access by maintaining replicas of data in strategic parts of the network. However, current research in this area does not manage memory space of each node efficiently. We propose an improvement by considering Bloom filters, a fast, spaceefficient probabilistic method for looking up data. We compare the system the system performance with and without Bloom fil...
متن کاملImproving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters
Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical...
متن کاملClassification of DNA sequences using Bloom filters
MOTIVATION New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. RESULTS A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can a...
متن کاملNAE-SAT-based probabilistic membership filters
Probabilistic membership filters are a type of data structure designed to quickly verify whether an element of a large data set belongs to a subset of the data. While false negatives are not possible, false positives are. Therefore, the main goal of any good probabilistic membership filter is to have a small false-positive rate while being memory efficient and fast to query. Although Bloom filt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 30 23 شماره
صفحات -
تاریخ انتشار 2014